---
title: Configuration Reference
description: Learn how to configure TensorZero Evaluations.
---

<Tip>

The configuration for TensorZero Evaluations should go in the same `tensorzero.toml` file as the rest of your TensorZero configuration.

</Tip>

## `[evaluations.evaluation_name]`

The `evaluations` sub-section of the config file defines the behavior of an evaluation in TensorZero.
You can define multiple evaluations by including multiple `[evaluations.evaluation_name]` sections.

If your `evaluation_name` is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define an evaluation named `foo.bar` as `[evaluations."foo.bar"]`.

```toml mark="email-guardrails"
// tensorzero.toml
[evaluations.email-guardrails]
# ...
```

### `type`

- **Type:** Literal `"inference"` (we may add other options here later on)
- **Required:** yes

### `function_name`

- **Type:** string
- **Required:** yes

This should be the name of a function defined in the `[functions]` section of the gateway config.
This value sets which function this evaluation should evaluate when run.

### `[evaluations.evaluation_name.evaluators.evaluator_name]`

The `evaluators` sub-section defines the behavior of a particular evaluator that will be run as part of its parent evaluation.
You can define multiple evaluators by including multiple `[evaluations.evaluation_name.evaluators.evaluator_name]` sections.

If your `evaluator_name` is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define `includes.jpg` as `[evaluations.evaluation_name.evaluators."includes.jpg"]`.

```toml mark="draft-email"
// tensorzero.toml
[evaluations.email-guardrails]
# ...

[evaluations.email-guardrails.evaluators."includes.jpg"]
# ...

[evaluations.email-guardrails.evaluators.check-signature]
# ...
```

#### `type`

- **Type:** string
- **Required:** yes

Defines the type of the evaluator.

TensorZero currently supports the following variant types:

| Type          | Description                                                                                                       |
| :------------ | ----------------------------------------------------------------------------------------------------------------- |
| `llm_judge`   | Use a TensorZero function as a judge                                                                              |
| `exact_match` | Evaluates whether the generated output exactly matches the reference output (skips the datapoint if unavailable). |

```toml
// tensorzero.toml
[evaluations.email-guardrails.evaluators.check-signature]
# ...
type = "llm_judge"
# ...
```

<Accordion title='type: "exact_match"' defaultOpen="true">

###### `cutoff`

- **Type:** float
- **Required:** no

Sets a user defined threshold at which the test is passing.
This can be useful for applications where the evaluations are run as an automated test.
If the average value of this evaluator is below the cutoff, the evaluations binary will return a nonzero status code.

</Accordion>

<Accordion title='type: "llm_judge"' defaultOpen="true">

###### `input_format`

- **Type:** string
- **Required:** no (default: `serialized`)

Defines the format of the input provided to the LLM judge.

- `serialized`: Passes the input messages, generated output, and reference output (if included) as a single serialized string.
- `messages`: Passes the input messages, generated output, and reference output (if included) as distinct messages in the conversation history.

<Tip>

We only support evaluations with image data when `input_format` is set to `messages`.

</Tip>

```toml
// tensorzero.toml
[evaluations.email-guardrails.evaluators.check-signature]
# ...
type = "llm_judge"
input_format = "messages"
# ...
```

###### `output_type`

- **Type:** string
- **Required:** yes

Defines the expected data type of the evaluation result from the LLM judge.

- `float`: The judge is expected to return a floating-point number.
- `boolean`: The judge is expected to return a boolean value.

```toml
// tensorzero.toml
[evaluations.email-guardrails.evaluators.check-signature]
# ...
type = "llm_judge"
output_type = "float"
# ...
```

###### `include.reference_output`

- **Type:** boolean
- **Required:** no (default: `false`)

If set to `true`, the reference output associated with the evaluation datapoint will be included in the input provided to the LLM judge.
In these cases, the evaluation run will not run this evaluator for datapoints where there is no reference output.

```toml
// tensorzero.toml
[evaluations.email-guardrails.evaluators.check-signature]
# ...
type = "llm_judge"
include = { reference_output = true }
# ...
```

###### `optimize`

- **Type:** string
- **Required:** yes

Defines whether the metric produced by the LLM judge should be maximized or minimized.

- `max`: Higher values are better.
- `min`: Lower values are better.

```toml
// tensorzero.toml
[evaluations.email-guardrails.evaluators.check-signature]
# ...
type = "llm_judge"
optimize = "max"
# ...
```

###### `cutoff`

- **Type:** float
- **Required:** no

Sets a user defined threshold at which the test is passing.
This may be useful for applications where the evaluations are run as an automated test.
If the average value of this evaluator is below the cutoff (when `optimize` is `max`) or above the cutoff (when `optimize` is `min`), the evaluations binary will return a nonzero status code.

```toml
// tensorzero.toml
[evaluations.email-guardrails.evaluators.check-signature]
# ...
type = "llm_judge"
optimize = "max" # Example: Maximize score
cutoff = 0.8 # Example: Consider passing if average score is >= 0.8
# ...
```

###### `[evaluations.evaluation_name.evaluators.evaluator_name.variants.variant_name]`

An LLM Judge evaluator defines a TensorZero function that is used to judge the output of another TensorZero function.
Therefore, all the variant types that are available for a normal TensorZero function are also available for LLMs as judges &mdash; including all of our [inference-time optimizations](/gateway/guides/inference-time-optimizations/).

You can include a standard [variant configuration](/gateway/configuration-reference/#functionsfunction_namevariantsvariant_name) in this block, with two modifications:

- You must mark a single variant as `active`.
- For `chat_completion` variants, instead of a `system_template` we require `system_instructions` as a text file and take no other templates.

Here we list only the configuration for variants that differs from the configuration for a normal TensorZero function. Please refer the [variant configuration reference](/gateway/configuration-reference/#functionsfunction_namevariantsvariant_name) for the remaining options.

```toml
// tensorzero.toml
[evaluations.email-guardrails.evaluators.check-signature]
# ...
type = "llm_judge"
optimize = "max"

[evaluations.email-guardrails.evaluators.check-signature.variants."claude3.5sonnet"]
type = "chat_completion"
model = "anthropic::claude-sonnet-4-5-20250929"
temperature = 0.1
system_instructions = "./evaluations/email-guardrails/check-signature/system_instructions.txt"
# ... other chat completion configuration ...

[evaluations.email-guardrails.evaluators.check-signature.variants."mix3claude3.5sonnet"]
active = true  # if we run the `email-guardrails` evaluation, this is the variant we'll use for the check-signature evaluator
type = "experimental_mixture_of_n"
candidates = ["claude3.5sonnet", "claude3.5sonnet", "claude3.5sonnet"]



```

###### `active`

- **Type**: boolean
- **Required**: Defaults to `true` if there is a single variant configured. Otherwise, this field is required to be set to `true` for exactly one variant.

Sets which of the variants should be used for evaluation runs.

```toml
// tensorzero.toml
[evaluations.email-guardrails.evaluators.check-signature]
# ...

[evaluations.email-guardrails.evaluators.check-signature.variants."mix3claude3.5sonnet"]
active = true # if we run the `email-guardrails` evaluation, this is the variant we'll use for the check-signature evaluator
type = "experimental_mixture_of_n"
```

###### `system_instructions`

- **Type:** string (path)
- **Required**: yes

Defines the path to the system instructions file.
This path is relative to the configuration file.

This file should contain a text file with the system instructions for the LLM judge.
These instructions should instruct the judge to output a float or boolean value.
We use JSON mode to enforce that the judge returns a JSON object of the form `{"thinking": "<thinking>", "score": <float or boolean>}` configured to the `output_type` of the evaluator.

```text title="evaluations/email-guardrails/check-signature/claude_35_sonnet/system_instructions.txt"
Evaluate if the text follows the haiku structure of exactly three lines with a 5-7-5 syllable pattern, totaling 17 syllables. Verify only this specific syllable structure of a haiku without making content assumptions.
```

```toml
// tensorzero.toml
[evaluations.email-guardrails.evaluators.check-signature]
# ...
system_instructions = "./evaluations/email-guardrails/check-signature/claude_35_sonnet/system_instructions.txt"
# ...
```

</Accordion>
